Improving real-time human pose estimation from multi-view video
نویسنده
چکیده
Capturing human motion is a key problem in computer vision, because of the wide range of applications that can benefit from the acquired data. Motion capture is used to identify people by their gait, for interacting with computers using gestures, for improving the performance of athletes, for diagnosis of orthopaedic patients, and for creating virtual characters with more natural looking motions in movies and games. These are but a few of the possible applications of human motion capture. In some of the application areas mentioned above it is important that the data acquisition is unconstrained by the markers or wearable sensors traditionally used in commercial motion capture systems. Furthermore, there is a need for low latency and real-time performance for certain applications, for instance in perceptive user interfaces and gait recognition. Human pose estimation is defined as the process of estimating the configuration of the underlying skeletal structure of the human body. In this dissertation several algorithms that together form a real-time pose estimation pipeline are proposed. Images captured with a calibrated multi-camera system are input to the pipeline, and the 3D positions of 25 joints in the global coordinate frame are the resulting output. The steps of the pipeline are: a) subtract the background from the images to create silhouettes; b) reconstruct the volume occupied by the performer from the silhouette images; c) extract skeleton curves from the volume; d) identify extremities and segment the skeletal curves into body parts; and e) fit a model to the labelled data. The pipeline can initialise automatically, and can recover from errors in estimation. There are four main contributions of the research effort presented in this dissertation: a) a toolset for evaluating shape-from-silhouette-based pose estimation algorithms using synthetic motion sequences generated from real motion capture data; b) a fully parallel thinning algorithm implemented on the Graphics Processing Unit (GPU) that can skeletonise voxel volumes in real time; c) a real-time pose estimation algorithm that builds a tree structure segmented into body parts from skeleton data; and d) a constraint algorithm that can fit an articulated model to a labelled tree structure in real time.
منابع مشابه
تخمین چنددوربینی حالت سه بعدی انسان با برازش افکنش مدل اسکلت سه بعدی مفصل دار در تصاویر سایه نما
Automatic capture and analysis of human motion, based on images or video is important issue in computer vision due to the vast number of applications in animation, surveillance, biomechanics, Human Computer Interaction, entertainment and game industry. In these applications, it is clear that 3D human pose estimation is an essential part. Therefore, its accuracy has a great effect on the perform...
متن کاملTowards Accurate Markerless Human Shape and Pose Estimation over Time
Existing markerless motion capture methods often assume known backgrounds, static cameras, and sequence specific motion priors, limiting their application scenarios. Here we present a fully automatic method that, given multi-view videos, estimates 3D human pose and body shape. We take the recently proposed SMPLify method [12] as the base method and extend it in several ways. First we fit a 3D h...
متن کاملReal-Time Pose Estimation Using Constrained Dynamics
Pose estimation in the context of human motion analysis is the process of approximating the body configuration in each frame of a motion sequence. We propose a novel pose estimation method based on fitting a skeletal model to tree structures built from skeletonised visual hulls reconstructed from multi-view video. The pose is estimated independently in each frame, hence the method can recover f...
متن کاملHuman 3D Pose Estimation and Activity Recognition from Multi-View Videos: Comparative Explorations of Recent Developments
This paper presents a review and comparative study of recent multi-view approaches for human 3D pose estimation and activity recognition. We discuss the application domain of human pose estimation and activity recognition and the associated requirements, covering: advanced Human-Computer Interaction (HCI), assisted living, gesture-based interactive games, intelligent driver assistance systems, ...
متن کاملDense 3D face alignment from 2D video for real-time use
To enable real-time, person-independent 3D registration from 2D video, we developed a 3D cascade regression approach in which facial landmarks remain invariant across pose over a range of approximately 60 degrees. From a single 2D image of a person’s face, a dense 3D shape is registered in real time for each frame. The algorithm utilizes a fast cascade regression framework trained on high-resol...
متن کامل